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Abstract 

In this paper, the asymptotic performance of the lattice sequential decoder for LAttice Space-Time (LAST) 
coded MIMO channel is analyzed. We determine the rates achievable by lattice coding and sequential decoding 
applied to such a channel. The diversity-multiplexing tradeoff (DMT) under lattice sequential decoding is derived 
as a function of its parameter — the bias term, which is critical for controlling the amount of computations required 
at the decoding stage. Achieving low decoding complexity requires increasing the value of the bias term. However, 
this is done at the expense of losing the optimal tradeoff of the channel. In this work, we derive the tail distribution 
of the decoder's computational complexity in the high signal-to-noise ratio regime. Our analysis reveals that the 
tail distribution of such a low complexity decoder is dominated by the outage probability of the channel for the 
underlying coding scheme. Also, the tail exponent of the complexity distribution is shown to be equivalent to 
the DMT achieved by lattice coding and lattice sequential decoding schemes. We derive the asymptotic average 
complexity of the sequential decoder as a function of the system parameters. In particular, we show that there exists 
a cut-off multiplexing gain for which the average computational complexity of the decoder remains bounded. 

I. Introduction 

The most important parameters for the information data transmission problem are: the rate R, the 
probability of decoding error P e , the block-length m, and the complexity (encoding and decoding). It is 
well-known that the noise introduced by the channel sets up a fundamental constant on how much data can 
be transmitted through the channel. This constant is called the capacity [1J. For a fixed channel, our goal 



is to transmit at rates close to capacity with low probability of decoding error using simple coding and 
decoding algorithms and short codes. However, there is a tradeoff. The tradeoff between the performance, 
the achievable rate, and the complexity is fundamental that exits in any communication system. 

Low complexity capacity-achieving codes exist. A special type of these codes which will be considered 
intensively in this work are constructed based on lattices — a mathematical approach for representing 
infinite discrete points in the Euclidean space [2J. The theory of lattices has become a powerful tool 
to analyze many point-to-point digital and wireless communication systems, particularly, communication 
systems that can be well-described by the linear Gaussian vector channel model. The channel model may 
be mathematically expressed as 

y = Bx + e, (1) 

where x G IR m is the input to the channel, y G IR m is the output of the channel, e G W 71 is the additive 
Gaussian noise vector with entries that are independent identically distributed, zero-mean Gaussian random 
variables with variance a 2 , i.e., e ~ A/"(0, a 2 I m ), and B G M mxm is a matrix representing the channel 
linear mapping. 

It is the linearity of the channel that makes it a good match to the linearity of lattices. In this paper, we 
assume that a; is a codeword selected uniformly from a lattice code. Let A c = A(G) = {x = Gz : z G Z m } 
be a lattice in W 1 where G is an m x m full-rank lattice generator matrix. The Voronoi cell, 14(G), that 
corresponds to the lattice point x G A c is the set of points in K m closest to x than to any other point 
A G A c , with volume that is given by V c = Vol 04(G)) = y/det(G J G). An m-dimensional lattice 
code C(A c ,u ,lZ) is the finite subset of the lattice translate A c + m inside the shaping region 1Z, i.e., 
C = {A c + u } fl 1Z, where 1Z is a bounded measurable region^ of M. m . 

For the above channel model and assuming B is perfectly known at the receiver, it is well-known that 
the maximum-likelihood (ML) decoder 

x = argmin \y — Bx\ 2 , (2) 

xec 

is the optimal solution that minimizes the word error probability P e = Pr(x ^ x), where x is the output of 
the decoder. In such a decoder, the received signal is decoded to the nearest codeword or lattice point inside 

'in this paper, we consider a shaping region TZ that corresponds to the Voronoi cell V 3 of a sublattice A s of A c , i.e., A B C A c . The 
generated codes are called nested (or Voronoi) lattice codes (see |37| for more details). 



1Z. Lattice coding and ML decoding achieve the capacity of the channel Q. However, in ML decoding, 
searching over the codebook C is performed by a search algorithm (e.g., the sphere decoder) that takes into 
account the shaping region 1Z which is referred to as boundary control. Due to its exponential complexity, 
the implementation of the ML decoder is practically unfeasible and the design of low complexity receivers 
that achieve near optimal performance is considered a challenging problem. 

Relaxing the boundary control, or lattice decoding, is believed to reduce the complexity at the expense 
of introducing some error performance degradation. Lattice decoder algorithms reduce the complexity by 
relaxing the code boundary constraint and find the point of the underlying (infinite) lattice closest to the 
received point (which may or may not be a code point). In lattice theory, this is usually referred to as the 
closest lattice point search problem (CLPS) [0, which can be described by 

x = arg min \y — Bx\ 2 . (3) 

a:eA c 

Many researchers have studied the information-theoretic limits of lattice coding and decoding schemes 
for the linear Gaussian vector channel model (see [4 J and references therein). For many scenarios that fall 
into such class, it has been shown that lattice decoding by itself cannot achieve the capacity of the channel 
at any signal-to-noise ratio (SNR). One common way that is used to overcome such deficiency of lattice 
decoding is through the use of minimum mean square-error decision feedback equalizatioij^](MMSE-DFE) 
|[2TT | . 11361 . In this case, the above channel model still applies with B representing the feedback filter matrix 
of the MMSE-DFB3 We review the achievable rates of some important channels under MMSE-DFE lattice 
decoding that provide the main motivation to the rest of the work. 

• The Lattice Coded AWGN Channel: This model corresponds to the case where the noise variance 
a 2 = 1, and B = ^1 + pl m , where p is defined as the signal-to-noise ratio (SNR) at the receiver. 
Now, one can show that reliable communication can be made possible as long as we operate at rates 

R < log detOB 7 ^) 1 /" 1 = log(l + p), 

which is the capacity of the AWGN channel. A very interesting approach that may be used to prove 

2 For the case of additive white Gaussian noise channel, the capacity is achieved through the use of the linear minimum mean-square error 
estimator of the channel input from the channel output |7]. 

3 For finite system dimensionality, the additive noise e, although can be shown to have uncorrelated elements, may not be Gaussian. 
However, this have no significant effect on the results at the signal-to-noise ratio of interest (see 1211 for more details about this topic). 



the rate achievability of lattice coding and decoding schemes for such a channel is through the so- 
called the ambiguity decoder. Lattice ambiguity decoder was originally developed by Loeliger in 
for the additive white Gaussian noise (AWGN) channel. The result was extended in [|2TT| to the 
quasi-static, Rayleigh fading M x N LAttice Space-Time (LAST) coded multiple-input multiple- 
output (MIMO) channel. The same technique will be used in this work to analyze the achievable rate 
of other efficient lattice decoders for the quasi-static MIMO channel. 
. The LAST Coded M x N MIMO Channel: We consider a quasi-static, Rayleigh fading MIMO 
channel with M -transmit, A r -receive antennas, and no channel state information (CSI) at the trans- 
mitter and perfect CSI at the receiver. The complex base-band model of the received signal can be 
mathematically described by 

y* = 4~pH c X c + W'\ (4) 

where X c G C MxT is the transmitted space-time code matrix, T is the codeword length (the number 
of channel usages), Y c G C NxT is the received signal matrix, W c G C NxT is the noise matrix, 
H c G C NxM is the channel matrix, and p = SNR/M is the normalized SNR at each receive antenna 
with respect to M. The elements of both the noise matrix and the channel fading gain matrix are 
assumed to be independent identically distributed zero mean circularly symmetric complex Gaussian 
random variables with variance a 2 = 1. The equivalent real channel model can be described as in 
Q with e ~ A/"(0,0.5/ m ). 

An MxT space-time coding scheme is a full-dimensional LAST code if its vectorized (real) codebook 
(corresponding to the channel model ([T])) is a lattice code with dimension m = 2MT. As discussed 
in [|2TT|. the design of space-time signals reduces to the construction of a codebook C C ]R 2MT with 
code rate R — ^ log \C\, satisfying the input averaging power constraint 

^-^|x| 2 <MT. (5) 
' ' sec 

For a fixed non-random channel matrix H c , it has been shown in lETI that the rate 

R LAST (p,H c ) = logdet(B T B) l / 2T = logdet (I M + p{H c )"H c ) , (6) 
is achievable under lattice coding and MMSE-DFE lattice decoding. 



Unfortunately, operating at a rate equal to capacity in the AWGN channel or at a rate equal to -Rlast 
in the quasi-static MEMO channel is not possible. This is due to the fact that the ML decoder or the 
MMSE-DFE lattice decoder (implemented via sphere decoding algorithms), suffers from high decoding 
complexity especially for large signal dimensions m for which high rates can be achieved. As such, 
allowing low computational complexity search algorithms at the decoding stage requires the use of codes 
with rates below capacity. 

It is well-known that lattice decoders that use sphere decoding algorithms can be considered as a search 
in a tree [fl8ll . [|24|. Il26ll . Generally speaking, a sphere decoding algorithm explores the tree of all possible 
lattice points and uses a path metric in order to discard paths corresponding to points outside the search 
sphere. As an alternative to sphere decoding algorithms, sequential decoders comprise a set of efficient and 
powerful decoding techniques able to perform the tree search. These decoders can achieve near-optimal 
performance without suffering the complexity of the ML or the sphere decoder for coding rates not too 
close to the channel capacity. In this case, it is convenient to define the decoding complexity as the total 
number of nodes visited by the decoder during the search. 

Conventional sequential decoders (e.g., Fano and Stack algorithms [ 1()|.[ lT)) were originally constructed 
as an alternative to the ML decoder to decode convolutional codes transmitted via discrete memoryless 
channels while achieving low (average) decoding complexity. Similar to the sphere decoder, the sequential 
decoder uses a path metric in order to eliminant a large subset of lattice points that have low chance to 
be extended by the search algorithm. 

For a general discrete memoryless channel, the sequential decoder's path metric, termed as the Fano 
metric, is given by (see |[T3l ) 



where 1-L{x\) is the hypothesis that x\ forms the first k symbols of the transmitted sequence, y\ is the 
first k symbols of the received sequence, and p(-) is the probability distribution function. We review the 
path metric of some important channels, and discuss the rate-complexity tradeoff achieved by the decoder 
and the optimal metric that leads to low decoding complexity. 

• The Binary Symmetric Channel: In such a channel, the sequential decoder is used to decode linear 
convolutional codes. For 1 < k < m, if ^Pr{T-L{x\)) is uniform over all nodes x\ that consist of the 




(7) 



first k components of any valid codeword in the code, the path metric can be written as 



k 



p(yi\xj) 
p(y%) 



-bk, 



i=l 



where p(yi\xi) is the channel transition probability, and b is the bias term. The bias is introduced to 
favor a longer path which is closer to the end of the tree and thus is more likely to be part of the 
optimal code path. Massey lfT2ll proved that at any decoding stage, extending the path with the largest 
Fano metric minimizes the probability that the extending path does not belong to the optimal code 
path. Moreover, Messey showed that the optimal bias that minimizes the computational complexity 
while achieving good error performance is b = R, the code rate. 

Although sequential decoding algorithms are simple to describe, the analysis of the decoder's com- 
putational complexity is considered difficult. This is due to the fact that the amount of computations 
performed by the decoder attempting to decode a message is random. Therefore, sequential decoding 
complexity is usually analyzed through its computational distribution. For codes transmitted at rate 
R, the computational complexity of the sequential decoder, denoted by C, for the above mentioned 
channel follows a Pareto distribution ||33l . 



where L is the distribution parameter, and e(R) is the tail distribution exponent that is a function of 
R. Theoretical analysis showed that e(R) > 1 as long as R < R , where Rq is called the channel 
computational cut-off rate which is strictly less than the channel capacity. This means that the average 
computational complexity is kept bounded as long as we operate at rates below Rq. Therefore, for 
many coding and decoding schemes, R is considered to be the "practical" capacity of the channel. 
• The Lattice Coded AWGN Channel: The problem of detecting and decoding the received signal y 
is transformed into a tree search algorithm using the QR-decomposition on the code matrix G. Let 
Q and R be the orthonormal matrix and the upper triangular matrix with positive diagonal elements, 
respectively, that correspond to the QR-decomposition of G. Assuming x = Gz was transmitted, then 
one can show that for moderate-to-large SNR, the Fano matric ([7]) can be expressed as (see ifToll . 



Pr(C > L) « L~ e{R \ L 



— >• oo, 



(8) 



ma) 

Im{z$) = bk- \y'\ - R kk z\\ 2 , V 1 < k < m, (9) 

where z\ = [z k , • • • , z 2 , Zi] T denotes the last k components of the integer vector z, R kk is the lower 
k x k part of the matrix R, and y'\ is the last k components of the vector y' = Q T y. 
There are several works that discuss sequential decoding for the lattice coded AWGN channel lfT4l - 
ifToll . For such a channel, it is well-known that sequential decoding of lattice codes can operate 
"efficiently" (with bounded average complexity) at rates below Ro which is only a factor of approxi- 
mately 1.7 dB away from capacity [|T5l . It is important to note that for sequential decoding algorithms 
that approximate lattice decoding, choosing b = R is not the optimal solution that minimizes the 
average complexity. This is due to the infinite number of virtual codewords as seen by the decoder, 
and hence the rate R is meaning less. Therefore, one should appropriately select the bias term b 
so that we attain near-optimal performance while reducing the computational complexity. This may 
be achieved by the sequential decoder by ensuring that the metric along the correct path increases 
on average, while decreases along other paths. In this case, we choose b such that E e {ji{x\)} > 
(assuming x is the correct path). This corresponds to b > E{|[e]j| 2 } = a 2 . In fact, the optimal bias 
under lattice coding and sequential decoding schemes that is used to minimize the average decoding 
effort was derived in [fT6l and is given b>Q 

b = a 2 \og^, (10) 

where a 2 is the variance of the channel noise. 
• The M x N MIMO Channel: Applying sequential decoders for the detection of signals transmitted 
via MIMO communication channels introduced an alternative and interesting approach to solve the 
CLPS problem [|23l that is related to the optimal decoding rule in such channels |[T8l , IfTTll . Murugan 
et. al. [fT8l showed that lattice sequential decoders, although sub-optimal, are capable of achieving 
good, and for some cases near-ML, error performance. The analysis was considered only for the case 
of uncoded MIMO channel (i.e., V-BLAST). It was demonstrated that lattice sequential decoders 

4 This optimal bias term was derived based on the work by Poltyrev in |5|. In his work, Poltyrev considered the fundamental limits achieved 
by lattice coding and decoding for the unconstrained AWGN channel. A new notion of capacity was considered in his work based on the 
characteristic of lattices (the lattice density to be specific) which is termed as the volume-to-noise ratio (see |5| for more details). 



achieve the maximum receive diversity provided by the channel, and for low signal dimensions these 
decoders achieve near-ML performance while significantly reducing decoding complexity compared 
to lattice decoders. Specifically, they showed that for any fixed (large enough) b, the sequential 
decoder achieves the maximum diversity gain iV with computational complexity that scales at most 
linearly with the signal dimension m as long as p > p , where po is the minimum SNR required for 
the average complexity to remain bounded. It must be noted that p is a function of m and b and 
increases proportionally with these parameters as they increase. 

Interestingly, the lattice sequential decoder allows for a systematic approach for trading off perfor- 
mance for complexity. It was argued in |[T8l that as b — > we achieve the best (lattice decoding) 
performance but at the price of high complexity. It has been shown in lfT8l via simulation, that there 
exists a value of b, say b*, such that for all b > b*, the average computational complexity decreases 
monotonically with b. As b — > oo, the sequential decoder becomes equivalent to the MMSE-DFE 
decoder and the number of visited nodes is always equal to m at any SNR. It is well-know [30) that 
the MMSE-DFE decoder achieves a diversity gain equal to N — M + 1. Therefore, one must expect 
that as we vary the bias term from to oo, the diversity gain must change from N to N — M + 1. 
How the diversity order changes with the bias parameter was not shown in [18J. Moreover, the 
performance limits and the complexity achieved by lattice sequential decoders for (lattice) 
space-time coded MIMO channel I12H1 . ll3~lH . Il3~2ll have not yet been studied. This will be the main 
topic of the work presented here. 

A. Sequential Decoding for LAST Coded MIMO Channels 

In the quasi- static MIMO channel, achieving higher performance via diversity and higher data rate via 
multiplexing require incorporating error control coding (across antenna and time) at the transmitter (e.g., 
space-time codes). The Diversity-Multiplexing Tradeoff (DMT) [27 ] has become the standard tool that is 
used to evaluate the performance limits of any coding and decoding schemes applied over outage-limited 
MIMO channels. Let the multiplexing gain r be defined as in ll27ll : 

r = hm , 

p^oo log p 



and the diversity gain d be defined as fl27ll : 

p^co log p 

With the aid of the MMSE-DFE at the decoding stage, LAST coding and lattice decoding achieve (see 
lETTl ) the optimal tradeoff, denoted by d out (r), of the channel 

d* out (r) = (M-r)(N-r), VO < r < min{M, iV}. (11) 

However, lattice decoders implemented via sphere decoding algorithms are only efficient in the high 
SNR regime and low signal dimensions, and exhibit exponential (average) complexity for low-to-moderate 
SNR and large signal dimensions Il24ll . Il34l . On the other extreme, linear and non-linear receivers such as 
zero-forcing, MMSE, and MMSE-DFE decoders, are considered attractive alternatives to lattice decoders 
in MIMO channels and have been widely used in many practical communication systems [f28l - [l30l . 
Unfortunately, the very low decoding complexity advantage that these decoders can provide comes at 
the expense of poor performance, especially for large signal dimensions. In fact, linear decoders cannot 
achieve the optimal DMT. However, with the aid of lattice reduction techniques dT9!l, Jalden and Elia [|20ll 
showed that the optimal tradeoff dl ut (r) can be achieved using linear decoders at a worst-case complexity 
0(log p). This corresponds to a linear increase in complexity as a function of the code rate R = r log p at 
high SNR. Unfortunately, as mentioned in [|20ll , this very low decoding complexity comes at the expense 
of a "large" performance (or SNR) gap from the lattice decoder's error performance. 

The problem of designing low complexity receivers for the MIMO channel that achieve near-optimal 
performance (i.e., with improved SNR gap from ML or lattice decoding) is considered a challenging 
problem and has driven much research in the past years. In this work, we analyze the performance of 
lattice sequential decoding that is capable of bridging the gap between lattice (or sphere) decoders and 
low complexity linear decoders (e.g., MMSE-DFE decoder). 

The problem of detecting and decoding the received signal y is transformed into a tree search algorithm 
using the QR-decomposition on the channel-code matrix BG. Similar to the lattice coded AWGN channel, 
the Fano metric that corresponds to the LAST coded MIMO channel is also given by ([7]). However, there 
are few differences when it comes to the choice of the sequential decoding parameter b. First, as will 
be shown in the sequel, for the lattice coded AWGN channel, choosing the optimal value of b, provided 



in (10), as the bias allows us to operate at rates close to R (the cut-off rate). Unfortunately, the fading 
nature of the MIMO wireless channel prevents us from selecting an optimal value of b that minimizes the 
decoding complexity for all SNR while still achieving near optimal performance. To further illustrate on 
this point, our results show that the achievable rate of the lattice sequential decoder, denoted by R b (p,H c ), 
applied to the LAST coded MIMO channel is equivalent to the achievable rate of the MMSE-DFE lattice 
decoding offset by a term that depends solely on the bias and increases proportionally with b, i.e., 

R b (p,H c ) = logdet(/ M + p(H c )"H c ) - r(6) 

with T(O) = 0. Now, one can see that when the channel is near outage, transmission at non-zero rates 
may not be possible for large values of bias term, for which low decoding complexity is expected. In 
order to overcome this problem, we either increase the transmission power which may not be possible 
due to some power constraints, or lower the value of the bias b but at the price of increasing the decoding 
complexity. 

It is clear from the above equation that, depending on the channel condition, the bias term should be 
adapted accordingly in order to maintain a non-zero achievable rate at any SNR. Now, if b = one 
achieves the optimal DMT c?* ut (r) but at the expense of increasing the decoding complexity. On the other 
hand, if we let b — > oo, we achieve a DMT d^r) — (N — M + 1)(1 — r/M), which corresponds to the 
tradeoff achieved by the MMSE-DFE decoder. Such a decoder has the lowest computational complexity 
(equal to m) but at the price of very poor performance. Therefore, by varying the bias term from to oo 
we achieve different achievable DMT curves. In this paper, we will show in details how the achievable 
DMT changes with the bias. 

B. Outline of the Main Contributions 

The contribution of this paper can be classified into two classes: the asymptotic performance analysis 
of the lattice sequential decoder in terms of the achievable DMT, and the computational complexity of 
the decoder in terms of the complexity tail distribution and the average complexity. In order to fully 
characterize the achievable DMT of the decoder, we determine for the first time the rates achievable 
by lattice coding and sequential decoding applied to the outage-limited MIMO channel. We derive the 
DMT as a function of the decoder bias term, which is critical for controlling the amount of computations 



required at the decoding stage. Achieving low decoding complexity requires increasing the value of the 
bias term. However, this is done at the expense of losing the optimal tradeoff of the channel. In terms of 
performance analysis, the work establishes the DMT optimality of fixed-bias lattice sequential decoding. 

We analyze in details the computational tail distribution of the decoder and its average complexity. 
Specifically, we show that, at the high SNR regime, when the computational complexity exceeds a certain 
limit, say L , the tail distribution becomes upper bounded by the asymptotic outage probability achieved 
by LAST coding and sequential decoding schemes, i.e., 

Pr(C > L) < p- d SutW ) L>L , 

where d* out (r) is the optimal DMT of the channel, and L is a random variable that depends on the channel 
condition and the code matrix. This interesting result suggests that one may save on decoding complexity 
while still achieving near-outage performance by setting a time-out limit at the decoder so that when the 
computational complexity exceeds this limit the decoder terminates the search and declares an error. 

Similar to the discrete memoryless channel, our analysis reveals that, for a fixed bias sequential decoding 
algorithm, there exists a cut-off multiplexing gain, denoted by r , for which the average computational 
complexity of the lattice sequential decoder remains bounded as long as we operate below such value. We 
argue that, in order to operate at multiplexing gains beyond r , large values of b must be used. However, 
this comes at the price of loosing the optimal tradeoff. Hence, the lattice sequential decoder provides a 
systematic approach for tradeoff DMT, cut-off multiplexing gain, and complexity. 

Our work is organized as follows. In Section II, we briefly describe the operation of various sequential 
decoding algorithms. In section III, we investigate the achievable rates of lattice sequential decoders for 
the outage-limited MIMO channel, and we derive the general DMT achieved by the decoder as a function 
of its parameter — the bias term. We show how this parameter plays a fundamental role in determining the 
DMT achieved by sequential decoding of lattice codes. The optimality of the lattice sequential decoder for 
the quasi-static MIMO channel is proven for finite bias term. The bias term is responsible for the excellent 
performance-complexity tradeoff achieved by the decoder. Sections IV and V provide complete analysis 
for the computational complexity tail distribution and the average complexity of the lattice sequential 
decoder in the high SNR regime. In section VI, our theoretical analysis is supported through simulation 
results. Finally, conclusions are provided in section VII. 



Throughout the paper, we use the following notation. The superscript c denotes complex quantities, 
denotes transpose, and H denotes Hermitian transpose. We refer to g(z) = z a as lim^oo g(z) / log(z) = a, 
> and < are used similarly. For a bounded Jordan-measurable region 1Z C M m , V(1Z) denotes the volume 
of 1Z, and I m denotes the m x m identity matrix. We denote S m (r) by the m-dimensional hypersphere 
of radius r with V(S m (r)) = (nr 2 )™-/ 2 /T(m/2 + 1), where T(x) denotes the Gamma function. 

II. Lattice Fano/Stack Sequential Decoder 

The sequential search on a tree can be briefly described as follows: the search is attempted one branch 
at a time. Namely, if the decoder is "located" at a particular node, it will move forward along the most 
likely branch stemming from it and thus reach a new node, provided that the likelihood of the entire past 
path up to and including the new node exceeds a certain current threshold. If it does not, then the decoder 
must return to the preceding node. From there it will try to move forward along an alternate path. It will 
succeed in this attempt if the value of the likelihood of the new path exceeds a threshold appropriate to 
it. Thus, the decoder moves forward and backward with the hope that the likely paths are going to be 
examined so that the average decoding effort will be kept low. 

Fano and Stack sequential decoders IfTOll , IfTTTl are efficient tree search algorithms that attempt to find 
a "best fit" with the received noisy signal. As in conventional sequential decoder, to determine a best fit 
(path), the path metric given in |7]) is assigned to each node on the tree. 

In the Stack algorithm, as the decoder searches the different nodes in the tree, an ordered list of 
previously examined paths of different lengths is kept in storage. Each stack entry contains a path along 
with its metric. Each decoding step consists of extending the top (best) path in the stack and reordering 
the stack list. The decoding algorithm terminates when the top path in the stack reaches the end of the 
tree (refer to 0T] for more details about the algorithm). 

In the Fano algorithm, as the decoder searches nodes, the path metric is compared to a certain threshold 
denoted by r £ {• • • , —25, —5, 0, 5, 25, • • • } where 5 is called the step size. The decoder attempts to 
extend the most probable path by moving "forward" if the path metric stays above the running threshold. 
Otherwise, it moves "backward" searching for another path that may lead to the most probable transmitted 
sequence (refer to iflOl for more details about the algorithm). 

Although the Stack decoder and the Fano algorithm generate essentially the same set of visited nodes 
(see ifLSl ). the Fano decoder visits some nodes more than once. However, the Fano decoder requires 



essentially no memory, unlike the Stack algorithm. Also, it must be noted that the way the nodes are 
generated in both sequential algorithms plays an important role in reducing the computation complexity 
and for some cases may improve the detection performance. For example, the determination of the best 
and next best nodes is simplified in the CLPS problem by using the Schnorr-Euchner enumeration ll26ll 
which generates nodes with metrics in ascending order given any node z\. However, it should be noted 
that for the entire paper, and for the sake of simplifying the analysis, we will consider the use of the 
Stack algorithm in our performance and complexity analysis. For the Fano algorithm the same results also 
apply. 

III. Outage Performance Analysis 

Our goal in this section is to analyze the DMT achieved by the LAST coding and MMSE-DFE lattice 
sequential decoding applied to the quasi-static Mx N MIMO channel. The achievable DMT of a particular 
coding and decoding schemes in such a channel is usually derived using the outage probability P ou t(p, R) 
which is defined as the probability that the coding rate cannot be supported by the channel. In other words, 
an outage occurs if the coding rate R exceeds the achievable rate of the channel. As such, determining the 
achievable rate under LAST coding and sequential decoding is essential in order to determine the DMT. 

A. Achievable Rate 

As discussed in the introduction, the sequential decoder's output depends critically on the bias term b 
(defined in Q). Therefore, it is to be expected that the achievable rate as well as the outage probability will 
depend heavily on such decoding parameter. As discussed in the previous section, rates up to logdet(/M + 
p{H c ) H H c ) are achievable by lattice coding and decoding. When the lattice decoder is replaced by the 
lattice Fano/Stacl<|^] sequential decoder we get the following result: 

Theorem 1. For a fixed non-random channel matrix H c , the rate 

R b (H c ,p) 4 max|logdet(7 M + p(H c ) H H c ) - 2Mlog ^I±^EE^ j0 J, (12) 
is achievable by LAST coding and MMSE-DFE lattice sequential decoding with bias term b, where a is 

5 For the Fano algorithm, we assume throughout the paper that only small values of step size 8 is used by the decoder, and hence, its 
affect on the performance analysis can be neglected (see the proof of Theorem 4). Otherwise, choosing very large values of 5 may result in 
very poor performance. For the Stack algorithm, we have 5 — 0. 



a=l j^y 6i (13 , 



given by 

,2r pack (BG) 

and r c ff(BG) and r pac k(BG) are the effective radius and packing radiu^of the lattice generated using 
BG, respectively. 

It should be noted that the above theorem applies to the general linear Gaussian vector channel model 
that is described in ([T]) with arbitrary B. As an example, consider again the lattice coded AWGN (non- 
fading) channel with B = a/1 + pl m under the use of MMSE-DFE lattice sequential decoding. As 
m — > oo, for a well-constructed lattice code ensambleQ it is well-known tha{^] 

2r pack (BG) 
r cS (BG) ~ 



In this case, the achievable rate in (12) reduces simply to 



i4(p)^log(l+p)-21og' 1 x/TTI ^ 



As discussed in the introduction, the optimal bias that achieves the best performance while maintaining 
low complexity decoding was found in ffi~6l and is given in ( flO] ). Substituting this value of b in the above 
equation with channel noise variance a 2 = 0.5, we get Rb(p) ~ log(l + p) — 1.64, which is approximately 
2 dB away from capacity. The achievable rate in this case is very close to the computational cut-off 
rate R of the AWGN channel. 

Some special remarks can be made about Theorem 1. First, it is clear that as b — > the achievable 
rate R b — > -Rlast = logdet(iM + p{H c ) H H c ). In fact, as will be shown in the sequel, as b — > we 
achieve the lattice decoder error performance at any SNR. As b becomes large, it may not be possible to 
transmit data at rates close to -Rlast- The problem that we may encounter here is that when the channel 
is near outage, for a fixed value of b we may not be able to send data at a positive rate, especially at 
low-to-moderate SNR. In this case, it is highly likely that the decoder will perform erroneous detection. 
However, as p — > oo, for any fixed b, the offset term that appears in the achievable rate equation becomes 

6 The effective radius of a lattice r c g is defined as the radius of the sphere with volume equal to the volume of the fundamental (Voronoi) 
region of the lattice. The packing radius r pac k is the largest radius of the sphere that is contained inside the Voronoi region of the lattice. 
7 Codes that are constructed using lattices that satisfy the Minkowski-Hlawka theorem (see (6)-(5j for more details) 
8 As discussed in 1371 , the packing efficiency J7 pac k, defined as r pac k(G)/r c ff (G), of a well-constructed lattice ensemble A(G) is 
asymptotically (as m — s> oo) bounded by 0.5 < r; pac k < 0.66. 



negligible and at very high SNR we have Rb ~ -Rlast- Therefore, for a fixed bias, although we may not 
achieve close to lattice decoding performance, one should expect the sequential decoder to achieve the 
optimal DMT of the channel dl ut (r). This is summarized in the following theorem: 

Theorem 2. There exists a sequence of nested LAST codes with block length T > M + N — 1 that achieves 
the optimal DMT curve d* ut (r) = (M — r)(N — r) for all r e [0, min{M, N}] under LAST coding and 
MMSE-DFE lattice sequential decoding for any fixed bias b > 0. 

Proof: (Sketch) For a fixed bias, we have Rb ~ -Rlast — 7 where 7 is a constant that depends on b. 
Let R(p) = r log p where r is the multiplexing gain, and denote < Ai < • • • < Am the eigenvalues of 
(H C ) H H C . Let ai = — log A;/ log p. Using the definition of the outage probability in ll27l . we have 



The last equation represents the achievable outage probability under ML (or MMSE-DFE lattice) decoding 
(see [27J or lUTTl ). Therefore, at high SNR we have P ont (p, r log p) = p- d SutM. According to (271, the 
outage probability serves as a lower bound for the probability of decoding error. Therefore, we have 



The proof is completed if we show that P e (p) is asymptotically upper bounded by p d S ut ( r ). This is done 



The above theorem indicates that the use of the ML or near-ML receivers (e.g., lattice decoders) is not 
essential if the main goal is to achieve the optimal tradeoff of the channel. Sub-optimal receivers may do 
the job. This result agrees with the work by Jalden and Elia [20J where they considered the use of lattice 
aided-reduction linear decoders and proved their optimality in the DMT sense. These decoders, although 
achieve very low decoding complexity, suffers from a large SNR gap from the ML or lattice decoding 



P<mt(p, R) = P*{R(P) > Rb(p)) = Pr(r logp > R L ast(p) ~ 7) 




(14) 



Pe(p) > P 



(15) 



in Appendix I. 



error performance. However, lattice sequential decoders allow for a systematic approach for trading off 
performance for complexity. Using a fixed but large value of b, although achieves the optimal DMT, the 
performance (SNR) gap from the ML or the lattice decoder increases as b becomes large. To achieve 
near-ML performance in this case, one has to resort to low fixed values of b. 

A natural question that may be asked at this point is: how large b can be set in order not to loose the 
optimal tradeoff? For fixed (finite) b, one cannot catch, in general, the effect of the bias term on the DMT 
achieved by such decoding scheme. As will be shown in the sequel, in order to do that, we need to allow 
the bias term to vary according to the channel condition and the SNR. 

Now, before proving Theorem 1, we would like to introduce the so called ambiguity decoder. The 
lattice ambiguity decoder was originally developed by Loeliger in J6l, and was used in EH to prove the 
achievability rate of the MMSE-DFE lattice decoder that is given in ([6]). The same technique will be used 
in this paper to derive the achievable rate under MMSE-DFE lattice sequential decoding. 

Assume the received vector can be written as y = x + w, where x E A c and w = A~ l e is an Tri- 
dimensional noise vector independent of x, for which A E W nxm is an arbitrary full-rank matrix and 
e ~ J\f (0,0.57). The ambiguity decoder is defined by a decision region £ C M m and outputs x E A c if 
y E £ + x and there exists no other point x' E A c such that y E £ + x'. An ambiguity occurs if the 
received vector y E {£ + x} fl {£ + x'} for some x ^ x' . If we define A(£) to be the ambiguity event 
for the decision region £, then for a given A c and £, the probability of error can be upper bounded as 

P e (£\A c ) < Pr(e £ £) + Pt(A(£)). (16) 

As mentioned in [6J, the upper bound ([To) holds for any Jordan measurable bounded subset £ of M. m . 



Consider now the following lemma: 

Lemma 1. There exists an m = 2MT -dimensional lattice code C(A c ,Uq,TZ) with fundamental volume V c 
that satisfies (T5]), for some fixed translation vector u , and 1Z is the mj 2- dimensional hypersphere with 



radius y/MT centered at the origin such that the error probability is upper bounded as 

Pe(A c , £ T , y ) < (1 + e ')2~ r [ lo s dct ( ATA ) 1/2T - Mlo s( 2 ^/ m )- R ] + Pr(e £ £ T>J ), (17) 
where £ T)1 = {z E R 2MT : z T A T Az < r 2 e {l + 7)}, r e > 0, 7 > 0, and e' > 0. 



Proof: See BUI. ■ 
The achievable rate under MMSE-DFE lattice decoding provided in ([6]) follows easily by letting A = B 
and r 2 = MT in the above lemma. In that case, from the standard typicality arguments it follows that for 
any e > and 7 > 0, there exists T 7 e such that for all T > T 7)E we have that Pr(e ^ £r, 7 ) < e/2. The 



second term in the upper bound ( 17 ) can be made smaller than e/2 for sufficiently large T if R < -Rlast- 
B. Proof of Theorem 1 

Proof: Consider an m = 2MT-dimensional lattice code C(A c ,w , TV) (that corresponds to a generated 
matrix G) with fundamental volume V c that satisfies ([5]), for some fixed translation vector Uq, and 1Z is 
the m/2-dimensional hypersphere with radius \/m/2 centered at the origin. 

The input to the MMSE-DFE lattice sequential decoder is the vector y' = Q T y, where Q is an orthogonal 
matrix that corresponds to the QR decomposition of the channel-code matrix BG = QR. The associated 
path metric in this case is given by ([9]). 

Consider the Stack algorithm with bias b > 0. Let E s be the event that the Stack decoder makes an 
erroneous detection. Due to lattice symmetry, we can assume that the all zero lattice point, i.e., 0, was 
transmitted. For a given lattice A c , the frame error rate of the lattice Stack sequential decoder. 

Pr(£ s |A c ) < Pr (J {//(*) > fi min } 
\zez m \{0} 

< Pr ( [J {\Bx\ 2 - 2{Bx) T e < bm} ] (18) 



( 



Pr 



U <2(Bx) T e > \Bx\ 2 (l 




where A* = A c \{0}, /i min = min{0, b — |e'}| 2 , 2b — \e'l\ 2 , . . . , bm — le'^l 2 } is the minimum metric that 
corresponds to the transmitted path, e! = Q T e, (a) is due to the fact that in general, fj,(z) > /i m i n is just 
a necessary condition for x = Gz to be decoded by the Fano decoder, and (b) follows by noticing that 
— (AWi + |e'| 2 ) < 0. It is clear from the above analysis that lattice Stack sequential decoder approaches 
the performance of lattice decoder as & — ?- 0. We make use of the fact that 

\Bx\ 2 > min \Bx\ 2 = (2r pack (£G)) 2 , 



where r pack (5G) is the packing radius of the lattice A(BG). Let b = b' (2r pack (BG)) 2 / m, where < b' < 1 



is a constant independent of the lattice A(BG). Then, we can further upper bound (18) as 



Pr(£ s |A c ) < Pr |J {2{B'x) T e > \B'x\ 2 } , (19) 

\xeA* c J 

where 

B' = (l-b')B. (20) 
The RHS of the upper bound (fT9|) corresponds to the probability of decoding error of a received signal 



y = Bx + e decoded using lattice decoding. It is clear from (20) that B' is invertible. In this case, we 
obtain the equivalent channel output 

y = B' V =x + e. 
Next, we apply the ambiguity decoder with decision region 

E' Ti = jz G M m : z T B' T B'z < MT(1 +7)} . (21) 
The probability of making a decoding error can then be upper bounded by 

Pr(£ s |A c ) < Pr(e G S' T J + Vv{A{£' Ta )). (22) 
Applying Lemma 1, there exists a lattice A*, and a translation vector Uq with error probability satisfying 
Pr(E s , A*) < (1 + e ') 2 - r [ 1 °s dct ( B ' Ts ') 1 / 2T -fi] + Pr(e ^ (23 ) 



Now, one can show that 



/ B /^r(m/2 + l) 2 / m V(S m (2r pack (BG))) 2/m 
(2r pack (£G)) 2 /m = 2 W"_W m7r2 ^ ST /M 

(a) oH T , AST /M ^(^(2r pack (Bg)))2/m 

_ 2 ^AST/Af ygAn K(5 m (2r pack (BG))) 2 /- 

2 V(K) 2 l m v c 2/m V{V{B)) 2 / m 
W 2[^A ST -fl]/M v( 5m (2r pack ( J BG))) 2 / m 



2 y(V(5G)) 2 /™ 
{J 2^A ST -/?]/M 1 / (tgm(2rpack(jBg))) 2/ m ^ 2 [i? LAST -fi]/Af / 2r pack (gG) \ 2 

2 ^(<S m (r eff (£G9)) 2 A* 2 V r cS (BG) J 



(24) 



where (a) follows from the fact that V(JZ) is the volume of the m-dimensional hypersphere of radius 
\Jm/2, (b) follows from the fact that for the shifted lattice code used in Lemma 1 we have 

T/2/m 

1 ' > 2~ R / M 



v{n) 2 /r 

and (c) follows from the definition of the effective radius of the lattice generated using the matrix BG. 



Therefore, we can further upper bound ( |23[ ) as 

Pr(£ s , A*) < (1 + e ') 2 -^°gdet(B T B)V 2 T_ i?] + pr(e ^ ^ (25) 

where B is given by 

B = V ~ 2l^-W"(2r J ^ k (BG)/r aS (BG))*) B ' (26) 
which is valid for all values of b < 2 [RhAST - R ~ 1] (2r peiCk (BG c )/r cS (BG c )) 2 . Noticing that 

/_ T _xl/2T / 2b \2m/2T 

Aet ( BB ) = I 1 - ^m^m^BCw ) det (B B) 

( 2b \ 2M 

- (' - 2^-m. {2rpM)M BG c)r ) det(/„ + ^r^), (27) 

and by solving for R, we achieve the desired result. ■ 
As discussed earlier, choosing a fixed but not very large values of b may result in achieving the optimal 
DMT of the channel. However, lattice sequential decoders are used as an alternative to ML and lattice 
decoders to achieve very low decoding complexity and to do so one has to resort to large values of b. As 



will be shown in the sequel, choosing large values of b may lead to a loss in the diversity gain and/or the 
multiplexing gain, and as a result, a loss in the optimal tradeoff. 

C. Achievable DMT: Variable Bias Term 

Our goal in this section is to derive the achievable outage performance for a general (not necessary 
fixed) bias term b. Denote < Ai < • • • < A M the eigenvalues of (H C ) H H C . Consider b as a function of 
p and A = (Ai, • • • , Aa/), and express it as 



infii(i+pA 



b(X,p) 



U/M 



, \ 1/2M' 



M 



Then, one can easily show that by substituting b in (13), we get 



2?" pac k(-BG c 

r e ff(BG c ) 



(28) 



R b (X,p) = logr/(A,p). 



(29) 



To fully characterize the achievable outage performance of lattice sequential decoders as a function of 



the bias, we allow b to vary with SNR and the channel eigenvalues as in (28). We define the outage event 
under lattice sequential decoding as 

O b (p)±{H c :R b (H c ,p)<R}. 

Denote R = r log p. The probability that the channel is in outage, P out (p, b) = Pr(C b (p)), can be evaluated 
as follows: 



P ont (p,b) = Pr(logr ] (\,p)<R). 



(30) 



The term r/(\, p) can be chosen freely between 1 (zero rate) and ]^[ i=1 (l + p\,j) (yields the maximum 
achievable rate under MMSE-DFE lattice decoding). Depending on the value of r?(A, p) we obtain different 
achievable rates and hence different outage performances. However, in our analysis and for the sake of 
simplicity, we let 

M 

V (X,p)=<Pl[(l + pX l ^, (31) 



i=l 



where < <\> < 1 is a constant, and VI < i < M, are constants that satisfy the following two 



constraints: Y^iLi Ci — an ^ Ci > C2 > ■ ■ • > Cm > 0. For example, setting r)(\,p) = (pYlfLii^ + P-^) 
(i.e., uniform values of Q, Ci — 1, Vi = 1, • • • , M) we achieve the optimal DMT in the sense that for 
such choice of r/(A, p) we have 

6 = V 1/M [1-0 1/2W ], 

which is fixed. This result agrees with Theorem 2. 
Now define u { = — log X-J log p, then 

log0jJ(l + pAi) Ci < rlogp 
i=i / 

= Pr^C<(l-f<) + <rV (32) 

where (x) + = max{0,x}, and log0 can be neglected at the high SNR regime. The typical outage event, 
at high SNR, can be written as 

{M 
i=l 

In this case, the outage probability can be evaluated as follows: 

Pout(p,b) = / f v {y) du, 

J 0+ id,-, (m) 

where f v (v) is the joint probability density function of u which, for all u e 0£(Ci, • • • > Cm)' is asymp- 
totically given by ETI 

/„(i/) = exp (- log(p) ^(2* -1 + N- M)v)j . (33) 
Applying Varadhan's lemma as in [|27l . we obtain 

where 



M 

' " V,-. 



4(r) = d(r, C) = mf (2i - 1 + N - M) 

where £ = (Ci, ■ ■ ■ , Cm)- It is clear from the above optimization problem that <4(r) depends critically 



on the selected coefficients £ (or equivalently b). Since Q are ordered, one can assume without loss of 
generality of the optimal solution that 1 > v\ > ■ ■ ■ > vu > 0. The linear optimization problem is 
therefore equivalent to the following problem 

( M 

Minimize : ^(2i - 1 + N - M)v { 
Such that : < v { < 1 Vi > 2 

M 

i=i 

where Q e [0, M]. We arrive now to the following results: 

. Case 1: (0 < Q < M, and d = M) We have the following: 
- If r = 0, the optimal solution is 



M 



1. 



- If r ^ 0, the optimal solution is 



ia = mm 



/M 



r ,1 



> 3=1 



and the DMT is given by 



4(0) = MN, 



M 



\fi > 1, 



r/(r.O = $>,-! + A" 



(34) 



(35) 



An interesting remark about this DMT is that maximum diversity d(0X) = MN is independent of 
> 1. Moreover, other than the uniform assignments of £ = (1, • • • , 1), the optimal DMT cannot 
be achieved. 



Case 2: (Q = for some i) For such choices of Q, it is clear that the optimal DMT is lost, i.e., 
d b (r) < (M — r)(N — r) for all r = 0, 1, • • • , M. The maximum diversity achieved in this scenario 
can be easily shown to be given by 

M 



d(0, C)=MN- ^(2* — 1 + N — M)5(Ci), 



i=i 



where 6(d) = 1 if = and otherwise. 

Example 1. Consider a LAST coded M x N MIMO channel under lattice sequential decoding with 
d = M, and Q = for all i / 1. this case, the achievable rate is given by Rb(p, Ai) = 
Mlog(l + pAi). The asymptotic outage probability can be expressed as 

P out (r log p) = Pr(Mlog(l + pA x ) < r logp) = Pr(pAi < p r/M - 1) 

= Pr(Ai < ^-(WAO) 



-(N-M+l)(l-r/M)+ 
P i 



(36) 



where we have used the fact that Pr(Ai < e) = e 



N-M+l 



as e — > / fiOl/ . Therefore, for such value of 



b, the best DMT that can be achieved by the decoder is db(r) — (N — M + 1)(1 — r/M) + . 
Interestingly, for Case 1, one can derive a closed form for the achievable DMT as given in the following 
theorem: 

Theorem 3. The DMT, db(r), for an M -transmit, IS -receive antenna coded MIMO Rayleigh channel 



under MMSE-DFE lattice Fano/Stack sequential decoding with bias b as given in (28) and coefficients 
d G (0, M), VI < i < M, with J2i d = M, is the piecewise-linear function connecting the points 
(r(k),d(k)), k = 0, 1, • • • , M where 

M 



r(0) = 0, r(k) = ^ d, 1 < k < M, 

i=M-k+l 

d(k) = (M — k)(N — k), 0<k<M. 
Proof: By solving the above optimization problem, we obtain the following DMT: 



(37) 



d(r,C) 



' M-k-l 

J2 (2i-l + N-M)+ 
i=i 

2(M — k) — l + N — M 



N-M + l 



(,M-k 
M 



M 

E G 

\j=M-k 



r e [r k ,r k+1 ], < k < M - 2; (3g) 
r e [r M -i,r M ), 



where 



0, k = 0; 

'V. \ M 

£ Ci, 1<A;<M. 



.i=M— fc+i 



Substituting in (38), we get the DMT expression in (37) 



Example 2. Consider a 2 x 2 MIMO channel. The DMT curves achieved with respect to different values 
of Q that correspond to Case 1 and Case 2 are illustrated in Fig. 1. Although the diversity at r = is 
not affected by the coefficients Q ^ (d(0) —4), the more unbalanced the coefficients are, the worse the 
DMT is. 

It is clear from the above analysis that by varying Q and correspondingly varying b, one can fully 
control the maximum diversity and multiplexing gains achieved by such decoding scheme. Fig. 2 shows 
the achievable DMT curves under lattice sequential decoding for all possible values of Q that satisfy the 
constraint £ i=1 Q — M. The figures include both Case 1 and Case 2. 

Following the footsteps of [21 J, we are now ready to prove the following theorem: 

Theorem 4. There exists a sequence of full- dimensional LAST codes with block length T > M + iV — 1 
that achieves the DMT curve db(r) under LAST coding and MMSE-DFE lattice Fano/Stack sequential 



decoding with variable bias term b that is given in ( |25] ). 

Proof: See Appendix III. 
D. Improving the Achievable Rate 



It is clear from (12) that lattice sequential decoders suffer from very poor performance as b becomes 
large (achievable rate Rb could reach 0!). The question that may arise here is whether the achievable rate 
of the decoder can be improved especially for large values of b (for which low decoding complexity is 
to be expected ffT8l ) and hence improving the error performance. 

It turns out that the way the nodes are generated in the algorithm plays an important role in improving 
both the achievable rate and performance of the decoder without increasing the decoding complexity. For 
example, Schnorr-Euchner enumeration is considered a good candidate for the use in lattice Fano/Stack 
sequential decoding algorithms lfT8ll . If the determination of best and next best nodes in the lattice 
Fano/Stack sequential decoder is based on the Schnorr-Euchner search strategy, then as b — > oo the decoder 



reduces to the MMSE-DFE decoder QUI , which achieves a DMT given by (N — M + l)(l — r /M) + [[H . 
Corollary 1. For a fixed non-random channel matrix H c , the rate 



R b (H c ,p) 4 maxj R LAS t(H c , P ) - 2Mlog ( 1 + ^~+^ a ) ; R MMSE _ DFE (H C , p) \, (39) 



is achievable by LAST coding and MMSE-DFE lattice Fano/Stack sequential decoding constructed under 
the Schnorr-Euchner search strategy, where Rmmse-bfe(H c , p) is the achievable rate of the MMSE-DFE 



decoder, and a is as defined in (13). 



In what follows, we discuss some interesting results about low computational complexity receivers. 

E. MMSE-like Receivers: Large N Analysis 

The main role of the bias term b is to control the amount of computations performed by the decoder. 
The computational complexity of the lattice sequential decoder is defined as the total number of nodes 
visited by the decoder during the search. It has been shown in [fT8l via simulation, that there exists a value 
of b, say b*, such that for all b > b*, the computational complexity decreases monotonically with b. As 
b — > oo, the number of visited nodes is always equal to m (computational complexity of the MMSE-DFE 
decoder). In what follows, we discuss a very interesting result. 

It is clear from the above analysis that increasing the bias b can affect both diversity and multiplexing 
gains achieved by such a decoding scheme. However, we would like to show that at r = (i.e., at fixed 
rate R), there exists a lattice sequential decoding algorithm that can simultaneously achieve computational 
complexity m and maximum diversity d = MN. 



Consider the bias term given in (28) with 77 (A, p) = YliLiO- + P^%)^ where the coefficients < Q < 1 
are chosen according to Case 1 such that Q = e for all i. In this case, as p — > 00, it can be easily verified 
that b = pTr 1 E?=i(i-a l )+ probability that b exceeds p K l M , for < k < M, can be evaluated as 
follows: 

(M \ / M 

(1 - e) 5^(1 - «,)+ > A = 1 - Pr f 53(1 - «,;)+ < 

= l_ p -(^-(i^)V-(i^)) + . 

It is clearly seen that, as N becomes large, with probability close to 1 the bias term b — > 00 as p — > 00. 
Therefore, for such choice of r/(A,p), at high SNR we can achieve linear computational complexity but 



at the expense of losing the optimal tradeoff. However, as argued in the proof of Theorem 3, at r = we 
have d = MN. Therefore, as p — > oo, linear computational complexity m and maximum diversity gain 
MN can be achieved simultaneously for large values of N. We can conclude that there exists a lattice 
sequential decoding algorithm that achieves ML decoder's diversity gain, MN, at r = (fixed rate R) 
when N — > oo. 

IV. Computational Complexity: Tail Distribution in the High SNR Regime 

Lattice sequential decoders are constructed as an alternative to sphere decoders (or equivalently lattice 
decoders) to solve the CLPS problem with much lower computational complexity. Due to the random 
nature of the channel matrix and the additive noise, the computational complexity of both decoders is 
considered difficult to analyze in general. As such, most of the work related to such analysis has been 
performed via first and second order statistics of the complexity [241. ll25l . ll34l . However, in 113511 . Seethaler 
et. al. took a different path and analyzed the sphere decoder through its complexity tail distribution 
defined as Pr(C > L), where C is the total number of computations performed by the decoder and L 
is the distribution parameter. This approach follows naturally from the randomness of the computational 
complexity of such decoding scheme. It has been shown in [13311 that, for large L (i.e., as L — > oo), the 
complexity distribution of sphere decoder is of a Pareto-type that is given by L~( N ~ M+1 L 

As discussed earlier, the bias term b is responsible for the performance-complexity tradeoff achieved 
by the lattice sequential decoders [fT8ll . For example, setting b = 0, we achieve the best performance 
(performance of sphere decoder) but at the expense of very large decoding complexity. On the other 
extreme, setting b = oo, lattice sequential decoder that uses Schnorr-Euchner enumeration becomes 
equivalent to the MMSE-DFE decoder. Although it achieves very low decoding complexity, it suffers 
from poor performance. In our work, we consider the case of fixed (finite) b. It turns out that for fixed 
but not large values of b, the complexity distribution's tail exponent e(r) defined by 

-logPr(C7>L) 

e(rj = lim , 

p^oo log p 

is asymptotically lower bounded by the DMT achieved by the LAST coding and sequential decoding 
schemes, i.e., e(r) > d out (r), and does not depend on the bias term at the high SNR regime. However, 
increasing the value of b could significantly lower the computational complexity (e.g., as b — > oo, Pr(C > 



L) — for L >m) but at the expense of great loss in the achievable DMT. 

We consider only lattice codes that are DMT optimal. Also, for the sake of simplicity we consider the 
Stack algorithm in analyzing the decoder's computational complexity. It must be noted that the following 
analysis is only valid for finite but small values of b. 

In this section, we would like to analyze the computational complexity of the MMSE-DFE lattice 
Stack sequential decoder with bias term b > 0, particularly at the high SNR regime. We are interested in 
bounding the tail distribution of the decoder's computational complexity at high SNR. 

Theorem 5. The asymptotic computational complexity distribution of the MMSE-DFE lattice sequential 
decoder in an M x N LAST coded MIMO channel with codeword length T > N + M — 1, is upper 
bounded by the asymptotic outage probability, i.e., 

Pr(C >L)< (0 -^ut(0 ) (40) 

for all L that satisfy 

r >TT7 | y W fc/2 [bk + MT(l + logp)]W 
- + 2_.rW2 + l) det(^ fe )V2 ' < ) 

where Rkk is the lower k x k part of R = Q T BG, and d* ut (r) = (M — r)(N — r). 

Proof: The input to the decoder, after QR preprocessing {BG = QR) of ([!]), is given by y' = Q T y = 
Rz + e', where e' = Q T e. Let yu min = min{0, b — \e'\\ 2 , 2b — \e'l\ 2 , . . . , bm — (e'^l 2 } be the minimum 
metric that corresponds to the transmitted path. Without loss of generality, we assume that N > M. Due 
to lattice symmetry, we assume that the all zero codeword, i.e., 0, was transmitted. 
First, let 

m 

be a random variable that denotes the total number of visited nodes during the search, where (j){z\) is the 
indicator function defined by 



1, if node z\ is extended; 
0, otherwise. 



In this case, the computational complexity tail distribution can be expressed as Pr(C > L), where L 



is the distribution parameter. Now, a node at level k, i.e., z\, may be extended by the Stack decoder if 
yu(zi) > /imin, or equivalently, if \e'\ — R k kz\\ 2 < bk — fi min . The difficulty in analyzing the computational 
complexity of the lattice Stack sequential decoder stems from the fact that the distribution of the partial 
matrix R kk is hard to obtain in general. Another factor that may complicate the analysis is yU min which is 
a noise dependent term. However, we can simplify the analysis by considering the following. 
First, the complexity tail distribution can be upper bounded as 



Pr(C > L) < Pr(C > L, \e'\ 2 < R 2 S ) + Pi(\e'\ 2 > R 



(42) 



where Rl > 0. 



Next, we would like to further upper bound the second term in the RHS of (42). We can first write 



(zi) as 



1, if ^X-RmzW 2 <bk-/j mi 
0, otherwise, 

Given \e'\ 2 < R 2 , and by noticing that — (fi mm + |e'| 2 ) < 0, we obtain 



where 



Now, let 



z\&L k 



1, if \e!\-R kk z\\ 2 < bk + R 
0, otherwise. 

S k , if \e' — Rz\ 2 < bm — /i min 
0, otherwise, 



(43) 



(44) 



where 



(45) 



then it can be easily shown that 

m m 
k=l zeZ m k=l xeA c 

where 

{S k , if \Bx\ 2 -2{Bx) T e< bm; 
■ 
0, otherwise, 

Notice the independence of the above upper bound on /i m i n . Consider now the following lemma: 

Lemma 1. In the lattice Stack sequential decoder with finite bias b > 0, the number of visited nodes at 
level k, given that \e'\ 2 < MT(1 + logp), can be upper bounded by 



V , q < W k/2 [bk + MT(1 + log p)] k / 2 

z {-J { l] ~ k ~ r(fc/2 + 1) det(Rj k R kk )^ ' 



where S k is as defined in (45) 



(46) 



(47) 



Proof: See Appendix III. 
For a given lattice A c , we have 

Pr(C > L\A C , \e'\ 2 < MT(1 + logp)) < Pr((7 > L — m\A c , \e'\ 2 < MT(1 + logp)) 

<E ,{C|A c ,|ef <MT(. + Iw » fori>m _ 
L — m 

where the last inequality follows from using Markov inequality, and C* is defined as 

fe=i zjez fc \{0} 

since we have assumed that the all-zero lattice point was transmitted. 

The conditional average of C with respect to the noise can be further upper bounded as 

in 

E e /{(7|A C , \e'\ 2 < MT(1 + logp)} < S k Pl (\ Bx \ 2 ~ 2(Bx) T e < bm) (48) 

k=l x&A* 



Therefore, we have 



Pr(C > L\A C , \e\ 2 < MT(1 + logp)) < 2 £ fc = 1 k ^(\Bx\ 2 - 2{Bx) T e < bm). (49) 

Following the proof of Theorem 2 (see Appendix I), and by averaging over the ensemble of random 
lattices we get, for L > m + XTfcLi $k 

Pr(C > L) < p- TE ,=i (i-aj) r ]_ (50) 

Define B = {u E : V\ > • • • > vm > 0, ^^(1 — < r}. By separating the event \y G A} from 
its complement, we obtain: 

Pr(C > L) < Yi{y G ^)+Pr(|ef > MT(1 + logp)) + Pr(C > L,v eA, \e'\ 2 < MT(l + logp)) (51) 



The behavior of the first term in (51 ) at high SNR is p~ d 2ut( r ) 5 where dl ut (r) is as defined in Theorem 
1. The second term can be shown to be upper bounded by p~ d °ut( r ) ( see OTTO . Averaging the third term 
over the channels in A set, we obtain, 



Pr(C > L) < p- d °^ + f j v {v) Pr(C > L\v) du < p" d - 

J A 



.('') 



(52) 



for all L > m + Y^T=i ^k> where f v (v) is the joint probability density function of v defined in (33). ■ 
The above results reveal that if the number of computations performed by the decoder exceeds 

_ , A (7nf 2 [&fc + MT(l + logp)] fc / 2 

S r ^/ 2 + 1 ) ^(R J M 1/2 ' C } 

the complexity distribution of the lattice sequential decoder at high SNR is upper bounded by the 
asymptotic outage probability. Now, if a "time-out" limit is imposed at the decoder to terminate the 
search when the number of computations exceeds this limit, then L represents the minimum value that 
should be set by the decoder without resulting in a loss in the optimal DMT. To see this, suppose that 
the lattice (Stack) sequential decoder imposes a time-out limit so that the search is terminated once the 
number of computations reaches L , and hence the decoder declares an error. Assuming E' s is the event 



that the decoder performs an error when C < L , in this case, the average error probability is given by 

P e (p) = Pr(£; U {C > L }) < Pr(E' s ) + Pr(C > L ) < p~^ r \ (54) 



This can be very beneficial in two-ways MIMO communication systems (e.g, MIMO automatic repeat 
request (HI), where the feedback channel can be used to eliminate the decoding failure probability. In 
applications where there is a hard-limit on the buffer size, the decoder declares an error when the 
complexity goes above the limit. 

It should be noted that the above analysis does not yield the full picture of the decoder's complexity in 
general. As mentioned previously, the complexity of the decoder depends critically on the bias b chosen 
in the algorithm. Unfortunately, it is still unclear how the SNR exponent e(r) is affected by the value 
b in general. However, as b — > oo, the MMSE-DFE lattice sequential decoder under Schnorr-Euchner 
enumeration becomes equivalent to the MMSE-DFE decoder ll2~6ll . The total number of computations 
performed by this decoder is always equal to m. This corresponds to an SNR exponent e(r) = oo. Thus, 
we can conclude that, at high SNR, as b increases the SNR exponent e(r) increases as well. 

Another criterion that is used to characterize the computational complexity of such a decoder is through 
its average complexity. Since L is random, it would be interesting to calculate the minimum average 
number of computations required by the decoder to terminate the search. This is considered next. 

V. Average Computational Complexity 

It is to be expected that when the channel is ill-conditioned (i.e., in outage) the computational complexity 
becomes extremely large. Moreover, when the channel is in outage it is highly likely that the decoder 
performs an erroneous detection. However, when the channel is not in outage, there is still a non-zero 



probability that the number of computations will become large (see (52) and (53)). As such, it is sometimes 
desirable to terminate the search even when the channel is not in outage. Therefore, we would like to 
determine the minimum average number of computations that is required in order for the decoder to 
determine when to terminate the search. 

In other words, we would like to find the minimum average number of computations that is required 
by the decoder to achieve the optimal DMT. This can be expressed as 

L ont = E{L (H c eO)}. (55) 



Before we do that, we would like first to study the asymptotic behavior of L . As mentioned in 
Section I, we focus our analysis on nested LAST codes, specifically LAST codes that are generated using 
construction A which is described below (see [j6l). 

We consider the Loeliger ensemble of mod-p lattices, where p is a prime. First, we generate the set of 
all lattices given by 

A p = «(C + P Z 2MT ) 

where p — > oo, k — > is a scaling coefficient chosen such that the fundamental volume Vf = K 2MT p 2MT - 1 = 
1, Z p denotes the field of mod-p integers, and C C Z 2MT is a linear code over Z p with generator matrix 
in systematic form [I P T ] T . We use a pair of self-similar lattices for nesting. We take the shaping lattice 
to be A s = 4>A P , where 4> is chosen such that the covering radius is 1/2 in order to satisfy the input power 
constraint. Finally, the coding lattice is obtained as A c = p~ r / 2M A s . Interestingly, one can construct a 
generator matrix of A p as (see El) 

// o\ 

G p = k\ , (56) 

\P pi) 

which has a lower triangular form. In this case, one can express the generator matrix of A c as G = 
p~ T l 2M G' , where G' = C,G P . Thanks to the lower triangular format of G. If M is an m x m arbitrary 
full-rank matrix, and G is an m x m lower triangular matrix, then one can easily show that 

det \{MG) kk ) = det (M„ ) det (G kk ), (57) 

where (MG)kk, Mu-, and Gkk, are the lower k x k part of MG, M, and G, respectively. 



Using the above result, one can express the determinant that appears in (53) as 

det(R T kk R kk ) = det{B T kk B kk ) det(G T kk G kk ) = p- rk ' 2M det(B T kk B kk ) det{G ,r kk G' kk ) (58) 
Let Hi < p,2 < ■ ■ ■ < [i k be the ordered nonzero eigenvalues of Bj k B kk , for k = 1, ■ ■ ■ , m. Then, 

k 

det(B T kk B kk ) = l[fi r 

i=i 

Note that for the special case when k = m we have /i2(j-i)T+i = ■ ■ • = p, 2 jT = 1 + p\j((H c ) H H c ), for 
all j = l,--.,M. 



Denote a- = — log log p. Using ([58]), one can asymptotically express L as 



L = m+(\ogp) m / 2 ^ogp) k/2 p c \ 



k=l 



(59) 



where 



1 



3=1 

Now, since is non-decreasing in k, we have at high SNR 



(60) 



L = m + (\og P r^p 



m/2 c m 



(61) 



where 



M 



r 

M 



'1 - on) 



The average of L at high SNR (averaged over the channel statistics) when the channel is not in outage 
is given by 



E{L (H c eO)}= J L f a {a)dxx 



m + (log p) m/2 / exp I log p 



M 



aeO 

M 



i=i 



r 

M 



m + (logp) m/ V 



m/2 Mr) 



-i \ 

^(2? - 1 + N - M)ou J 

i=l -J ' 



where £> = |a G : £^(1 - «0 + > r }> and 



l(r) = max 



M M 

t=i ' i=i 



(62) 



It is not so difficult to see that the optimal channel coefficients that maximize ( 62 ) are 



a* — 1, for z = 1, ■ ■ ■ , M — fc, 



and 



a* = 0, for i = M - k + 1, • • • , M, 



i.e., the same a* that achieves the optimal DMT of the channel. Substituting a* in ([62]), we get 



l(r) = Tr{M M r) -(M-r)(N- r), (63) 

for r = 0, l,-- - ,M. In this case, the asymptotic minimum average computational complexity that is 
required by the decoder to achieve near-optimal performance (as well as the optimal DMT) can be 
expressed as 

L out = 2MT + (log p) MT p l{r \ (64) 

The above interesting result indicates that if the "average" number of computations performed by the 
decoder exceeds L out , the decoder can terminate the search without affecting the optimal DMT. We discuss 
here some special cases about the behavior of L out in terms of the system parameters: p, M, N, and r. 
Consider the case of M = N. Assuming the use of an optimal random nested LAST code of codeword 
length T and fixed rate R, i.e., r = 0. In this case, one can see that Z mmse (0) < irrespective to the value 
of T, i.e., the average complexity is bounded for all T. It is clear that the term (logp) 2MT p _JVM decays 
quickly to as p — > oo. The simulation results (introduced next) agree with the above analysis. 

It is interesting to note that, there exists a cut-off multiplexing gain, say r , such that the average 
computational complexity of the decoder remains bounded as long as we operate below such value. This 
value can be easily found by setting / mmse (^o) = 0. This results in 

MN 

r 



M + T 



However, it should be noted that the above cut-off multiplexing gain corresponds to a sequential 
decoding algorithm that uses a fixed bias. If we need to operate at r > r , larger values of bias term 
must be used. In fact, one must let b to scale with SNR as b = p £ for some e > in order to keep the 
average complexity bounded when operating beyond r . However, according to the analysis provided in 
Section III, this causes a loss in the optimal tradeoff. Therefore, the lattice sequential decoder provides a 
systematic approach for tradeoff DMT, cut-off multiplexing gain, and complexity. 

Another way to reduce the computational complexity without the need of increasing the bias value, is 



to increase the number of receive antennas N. If we let iV — > oo, then one can achieve a multiplexing 
gain r = M which is the maximum multiplexing gain achieved by the channel. 

To see the great advantage of using the lattice sequential decoder with constant bias term over the lattice 
decoder implemented via sphere decoding algorithms, we compare the average computational complexity 
of both decoders when MMSE-DFE is presented. It has been shown in [|39ll that, for moderate-to-high 
SNR, the average computations performed by the MMSE-DFE sphere decoder when the channel is not 
in outage, say L spncrc for a system with m = 2MT signal dimension is given by (assuming fixed rate 
r = 0) 

(\ogn) 2MT 

L sphcrc = 2MT + 1 ^mn ■ ( 65 ) 

The ratio of the asymptotic average complexity of both decoders, say 7, is given by 

L sphere _ 2MT + (log pf MT /p MN 

7 £s~ntial 2MT + (log p)MTj p MN " 

This is a huge saving in computational complexity, especially for large signal dimensions and moderate- 
to-high SNR. For example, consider the case of a 3 x 3 LAST coded MIMO system with T = 5. At 
p = 10 3 (30 dB), we have 7 w 31, i.e., the sphere decoder's complexity is about 31 times larger than 
the complexity of the lattice sequential decoder. As will be shown in the sequel, simulation results agree 
with the above theoretical results. For p < 30 dB, one would expect the ratio 7 ^> 31. For extremely high 
SNR values (e.g., p 3> 30 dB), it seems that 7 — > 1 as p — > 00. 

VI. Numerical Results 

Throughout the simulation study, the fading coefficients are generated as independent identically dis- 
tributed circularly symmetric complex Gaussian random variables. The LAST code is obtained as an 
(m = 2MT,p, k) Loeliger construction (refer to [6] for a detailed description of the linear code obtained 
via Construction A). 

In Fig. 3, we compare the performance in terms of the frame error rate of a MIMO system with 
M = N = 2, T = 3 and rate R = 4 bits per channel use (bpcu) under naive and MMSE-DFE lattice 
sequential decoding. For both decoders we fix the bias term to b = 0.6. It is clear that the MMSE- 
DFE lattice sequential decoder outperforms the naive one, where the former achieves diversity order of 
4 (the maximum diversity gain achieved by the channel) and the latter achieves diversity order of 2. 



To validate the achievability of the optimal DMT with LAST coding and MMSE-DFE lattice sequential 
decoding, we consider the performance of a MIMO system with M = N = 2, T = 3 for different rates 
R = 4, 8, 10.34 bpcu, which is illustrated in Fig. 4. The constant gap between the outage probability 
and the error performance for different R confirms our theoretical results. 

Fig. 5 and Fig. 6 show the effect of increasing the bias term on diversity order and average computational 
complexity (number of visited nodes during the search) achieved by lattice sequential decoding. As 
discussed earlier, increasing the bias term in the decoding algorithm significantly reduces decoding 
complexity but at the expense of losing diversity. For the 2 x 2 LAST coded MIMO system with T = 3, 
as b — > oo we achieve linear computational complexity m — 12 for all SNR, and diversity order 1. For 
sequential decoding algorithms that implement the Schnorr-Euchner enumeration, this corresponds to the 
performance and the complexity of the MMSE-DFE decoder. 

In our computational complexity distribution simulation, we consider a MIMO system with M = N = 2, 
T = 3 for different rates R — 4, 8 bits per channel use. First, the frame error rate of the MMSE-DFE 
lattice sequential decoder is plotted in Fig. 7. (a) when b = 0.6. The computational complexity distribution 
Pr(C > L) is plotted for such a decoder at different rates when L is allowed to scale with the SNR as 
L = p (see Fig. 7.(6)). It is clear from both figures that the curves which correspond to the error probability 
and the computational complexity distribution match in slope, i.e., they both exhibit the same behavior at 
high SNR. Equivalently, both curves have the same SNR exponent. This basically agrees with the derived 
theoretical results. 

The complexity saving advantage that lattice sequential decoders posses over lattice (sphere) decoders 
is depicted in Fig. 8 and Fig. 9, for the same LAST coded MIMO channel with R = 4 bits per channel 
use. One can notice the amount of computations saved by lattice sequential decoders for all values of 
SNR, especially for large signal dimensions (see Fig. 10). Even at high SNR, the sphere decoder still 
exhibits large decoding complexity compared to the lattice sequential decoder. For example, as depicted 
in Fig. 10, at p = 30 dB, the average complexity of the sphere decoder is about 30 times the complexity 
of the lattice sequential decoder for an optimal LAST coded MIMO system with dimension m = 30. 
This is achieved at the expense of small loss in performance (~0.6 dB). This agrees with the derived 
theoretical results. 

Fig. 10 shows how the average complexity of the MMSE-DFE lattice sequential decoder decays with 



the SNR irrespective to the codeword length T for a fixed rate R, i.e., for r = 0. This agrees with the 
theoretical results derived in the previous section. Finally, Fig. 1 1 proves (by simulation) the fact that the 
MMSE-DFE lattice sequential decoder has a cut-off rate such that the average complexity of the decoder 
remains bounded as long as we operate below it. The figure shows that for fixed M, N, and T, if we 
increase the rate, the average complexity increases as well and becomes unbounded even at high SNR. 

VII. Summary 

In this chapter, we have provided a complete analysis for the performance limits of the lattice Fano/Stack 
sequential decoder applied to the LAST coded MIMO system. The achievable rate of the channel is derived. 
It turns out that the achievable rate under lattice sequential decoding depends critically on the decoding 
parameter, the bias term. The bias term is responsible for the excellent performance-complexity tradeoff 
achieved by such decoding scheme. For fixed values of the bias, it has been shown that the optimal 
tradeoff of the channel can be achieved. As the bias grows without bound, lattice sequential decoding 
achieves linear computational complexity, where the total number of visited nodes during the search is 
always equal to the lattice code dimension. As such, lattice sequential decoders bridge the gap between 
lattice (sphere) decodes and low complexity receivers (e.g., the MMSE-DFE decoder). At high SNR, it 
was argued that there exists a lattice sequential decoding algorithm that can achieve maximum diversity 
gain at very low multiplexing gain, especially for large number of receive antennas. 

We have also provided a complete analysis for the computational complexity of the lattice sequential 
decoder applied to the LAST coded MIMO systems at the high SNR regime. It has been shown that for the 
MMSE-DFE lattice sequential decoder, if the number of computations performed by the decoder exceeds 
a certain limit, then the complexity's tail distribution becomes dominated by the outage probability with 
an SNR exponent that is equivalent to the DMT achieved by the corresponding coding and decoding 
schemes. The tradeoff of the channel is naturally extended to include decoding complexity. Moreover, the 
decoder asymptotic average computational complexity has also been analyzed. Finally, it has been shown 
that there exists a cut-off multiplexing gain for which the average complexity remains bounded as long 
as we operate below such value. 

Appendix I 



Proof of Theorem 2 

The input to the decoder, after QR preprocessing (BG = QR) of ([I]), is given by y' = Q T y = Rz + e', 
where e' = Q T e. Let E s be the event that the lattice Stack sequential decoder makes an erroneous detection. 
Due to lattice symmetry, we assume that the all zero codeword was transmitted. Now, any sequence 
x = Gz ± 0, x e A c can be decoded as the closest lattice point by the decoder only if its metric ^(z™) 
is greater than yU min . Therefore, for a given lattice A c , 

Pr(£,|A c )< Yl Pr(MO>^min) 

(66) 

= Pr(|e' - Rz\ 2 <bm- /i min ). 

zez™\{0} 

where /i min = min{0, b — \e'{\ 2 , 2b — \e'\\ 2 , . . . , bm — |e'™| 2 } is the minimum metric that corresponds to 



the transmitted path. The upper bound in (66) follows from the union bound, and due to the fact that in 
general, ii{z™) > /i min is just a necessary condition for x to be decoded by the lattice Stack sequential 
decoder. By noticing that — (/i m ; n + |e'| 2 ) < 0, we get 

Pr(£ s |A c ) < Pr(l^| 2 ~ 2(Bx) J e' < bm), (67) 

x<=A* 



where A* = A c \{0}. Note the independence of the upper bound (67) on fi m - m . We would like now to upper 



bound the term inside the summation in (67). The difficulty here stems from the non-Gaussianity of the 
random vector e' for any finite T. However, one can show (see [37| and 112110 that for a well-constructed 
lattice the probability density function of the noise vector e, } e {v) < f3 m fe(v), where e ~ AT (0,0. 51), 
and (3 m is a constant (has no effect at high SNR). Following the footsteps of ETTl . it can be shown that 
by appropriately constructing a nested LAST code we have that 

Pr(£ s |A c ) <(3 m J2 Pr(|#x| 2 - 2(Bx) T e < bm), (68) 
where e ~ Af(0, 0.5I m ), and (3 m is a constant independent of p. Using Chernoff bound, 

e -\Bx\ 2 /S e bm/i^ |_g x |2 > &m; 

Pr(\Bx\ 2 -2(Bx) T e <bm) < { ' * (69) 

1, \Bx\ 2 < bm. 



By taking the expectation over the ensemble of random lattices (see [6], Theorem 4), 

Pr(E s ) = E Ac {Pr(E s |A c )}<^| J dx + e bm " j e"^ 2 / 8 dx\ 

^|Ba;| 2 <6m \Bx\ 2 >bm ' 

~ Vc \ T(m/2 + 1) det^B) 1 ^ + det^B) 1 ^ f 
Next, we make use of the fact that for nest lattice codes we have that (see 11371 ) 



(70) 



rt _ v{n) 



\c{h c ,n)\ = 2 y 

Also, it is easy to verify that 

det{B T B) = (det (i + ^-{H C ) H H C 
Denote R = rlogp and < Ai < - - - < \ m - m {M,N} the eigenvalues of (H C ) H H C , then, the bound (70) can 



2T 



be rewritten as (conditioned on channel statistics) 

Pr(£» < /C(m,6)p- TE ,=i H 1 -^)^, (71) 

where v = [v\, ■ ■ ■ , v m - m {M,N}), Vi — ~ log A^/ log p, (x) + = max{0,x}, and IC(m,b) is a constant 
independent of p. Now, define the set 

min{M,N} 



B={UE Er {M,JV} :*/!>•••> ^min{Af,AT} > 0, ]T (1 - !/,)+ < T \ . (72) 



Using (72), the probability of error can be upper bounded as follows: 

Pr(£ s ) < Pr(z/ G B) + Pr(£ s , 1/ 6 B). (73) 

The behavior of the first term at high SNR is p~ d Sut( r ). Averaging the second term over the channels in 
B set, we obtain (see ETTl ). 

Pr(£ s ) < p- d °^+ f f v {v) Pi(E s \u)du 

JB 

< p- d Sut« ) (74) 



where f v (v) is the joint probability density function of v given by ([33]) 



Appendix II 
Proof of Theorem 3 



We consider an ensemble of 2MT-dimensional random lattices {A c } with fundamental volume V c satis- 
fying the Minkowski-Hlawka theorem (see fl2T|, Theorem 1). The random lattice codebook is C(A,u , 1Z), 



for some fixed translation vector Mo and where 1Z is the 2MT-dimensional sphere of radius \/MT centered 
at the origin. The average probability of error (average over the channel and lattice ensemble) can be upper 
bounded as 

P e (p) = E A {P e (p|A)} 

(75) 

< E A {Pr(error, R b (p) > R(p))} + P out (p, b), 

where P e (p|A) is the probability of error for a given choice of A. Denote < Ai < • • • < \m the 
eigenvalues of (H C ) H H C , and let R = rlogp. As shown in Section IV.B, by expressing the bias term 
b as in ( |28| ), the achievable rate of lattice sequential decoding can be written as Rt, = log 77, where 
V = ^IltLii 1 + A) Ci - Now > define the asymptotic outage event B = {/? 6 Rjf : ^1 0(1 - A) + < r}, 
where = —log A,/ log p. Then, the first term in the RHS of the above upper bound can be expressed as 



E A {Pr(error, P 6 (p) > R(p))} = [_ f^)E A {P e (p\/3, A)} d/3 

Jb 

< Pr(|ef > MT(1 + 7)) + f_f fi (fi)Pi(^\fi) d/3, 

Jb 



(76) 



where 7 > 0, and fp(/3) is the joint probability density function of /3 which is asymptotically given by 

ffiifi) = exp (- log(p) Y^li - 1 + I N - M|)AJ • (77) 



Consider here the Stack algorithm (5 = 0). In this case, the matrix B provided in (|20j) can be expressed 
at high SNR as 

B'= (l-bp-^^-^-^B. 

Hence, at high SNR we have 

det(B T B) = (l - 6 p -E^(i-ft)+-]/Mj pE&a-A)* (78) 



As p — > oo, we can express b (see ([28])) as 



,EfIi(i-ft) + /A/ 



1/2M' 



(79) 



Substituting a79|) into (T78J), and by realizing that for all Rb > R or equivalently 77 > we can lower- 



~ T ~ 



bound (78) as det(B B) > rj. Setting A = B in Lemma 1, the ambiguity probability can be upper 
bounded as 

Pt(A\P) < exp(-T[log?? -rlogp]). (80) 
It has been shown in El that for T > M + N - 1, the SNR exponent of Pr(|e'| 2 > MT(1 + 7)) with 



respect to logp is larger than d (r) > d b (r). Substituting (80) in (76) we get (for T > M + N — 1) 



E A {Pr(error, R b {p) > R(p))} 

/ — 
< /exp -log(p) J2(2i-l + \N-M\)Pi + T 



M 



— r 



i=l 



dp (81) 



-d b (r) 



Appendix III: 
Proof of Lemma 1 

Without loss of generality, we assume that the all-zero lattice point was transmitted. Let 



1, if {e'l - R^zH 2 < bk + R 2 s , le'll 2 < Rj; 
0, otherwise. 



(82) 



where e\ is the last k components of e' = Q T e, and Q is the orthogonal matrix of the QR-decomposition 
of BG, and R 2 S = MT(1 + log p). Given that |e'| 2 < R 2 , it must follow that |e'J| < R 2 , for all 1 < k < m. 



The total number of integer lattice points that satisfy ( 82 ) can be upper bounded by 



(83) 



where 



1, if \e'1 - R kk z\\ 2 <bk + R 2 S , le'^l 2 <bk + R 2 ; 



0, otherwise. 



(84) 



In general one can show that for any random vectors u and v, and r s > 0, it holds{|tt — v\ 2 < r 2 , \v\ 2 < 
r s} ^ {\ v \ 2 < 4r 2 }. Therefore, one can easily show that 

S k < (85) 



where 

j\ if \R kk z1\ 2 <4(bk + R 2 s ); 
(*?) ={ (86) 
0, otherwise. 

We can further upper bound S k by introducing an auxiliary random variable that has a uniform 
distribution in the Voronoi region of the lattice A(R kk ). This can be done as follows: 
Let 

1, Ix^+uW 2 < 7(bk + R 2 ) 
0, otherwise 



where u\ is a random variable that is uniformly distributed in Vo(R kk ) and independent of x\. Then, 
assuming that there exists at least one lattice point x\ ^ inside the sphere, one can show that 

S k < Yl <f>( x i+ u i) 
4eA(i? fcfc ) 



The indicator function in ( 86 ) can be rewritten as 



1, \x\\ 2 <4{bk + R 2 ), + u\) - u\\ 2 < A{bk + R 2 ) 

0, otherwise 

1, \x\ | 2 < A(bk + R 2 ), \x\ + u\\ 2 < A(bk + R 2 ) + 2ufx k 1 + \u k \ 2 
0, otherwise 

where u\ is a uniform random variable in the fundamental region of the lattice A(R kk ). By noting that 

\u k \ 2 < (bk + R 2 ) [since u\ e V (A(12 fcfc ))], and < < (6Jfe + R 2 S ) (since |as*| < R s ), we 

then have 

x*£A(R kk ) x^eA(R kk ) 



Equivalently, we have that 

x%eA(R kh ) 



Now, taking the average in both sides of (87) over u\ G Vo(Rkk) we have (see Lemma 2 in j6]|) 



V f (A(R kk )) 
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Fig. 1. DMT curves df,(r) achieved by lattice Fano/Stack sequential decoder for the case of 2x2 MIMO channel for different values of 

«i,C 2 ). 




(b) DMT curves correspond to Ca.se 2 
Fig. 2. DMT curves db(r) achieved by lattice Fano/Stack sequential decoder for different bias b. 



M = N = 2, T = 3, R = 4 bpcu 




SNR (dB) 

Fig. 3. Performance comparison between naive and MMSE-DFE lattice sequential decoding with b — 0.6 for the case of 2 x 2 LAST 
coded MIMO channel with T = 3 and R = 4 bpcu. 
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Fig. 4. Outage probability and error rate performance of lattice sequential decoding with 6=1. 
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Fig. 6. Comparison of average computational complexity achieved by lattice sequential decoding for several values of b. 




Fig. 7. (a) Performance and (b) complexity distribution achieved by the MMSE-DFE lattice sequential decoder (b — 0.6) for the case of 
2x2 LAST coded MIMO channel. 




Fig. 8. (a) Performance and (b) average computational complexity comparison between sphere decoding and lattice sequential decoding 
for signal with dimension m — 12. 
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Fig. 9. (a) Performance and (6) average computational complexity comparison between sphere decoding and lattice sequential decoding 
for signal with dimension m — 30. 
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Fig. 10. The average computational complexity of the MMSE-DFE lattice Stack sequential decoder in a LAST coded 2x2 MIMO system 
with different codeword lengths T = 3, 4, 5 and fixed rate R. All curves decays quickly to m = 12, 16, 20, respectively, as SNR increases. 




Fig. 11. Plots of the average complexity of the lattice sequential decoder for an optimal nested LAST coded 2x2 MIMO system with 
different rates R in bpcu. 



